HIT2016@DPIL-FIRE2016: Detecting Paraphrases in Indian Languages based on Gradient Tree Boosting

نویسندگان

Leilei Kong

Kaisheng Chen

Liuyang Tian

Zhenyuan Hao

Zhongyuan Han

Haoliang Qi

چکیده

Detecting paraphrase is an important and challenging task. It can be used in paraphrases generation and extraction, machine translation, question and answer and plagiarism detection. Since the same meaning of a sentence is expressed in another sentence using different words, it makes the traditional methods based on lexical similarity ineffective. In this paper, we describe a strategy of Detecting Paraphrases in Indian Languages, which is a workshop track proposed by Forum Information Retrieval Evaluation 2016. We formalize this task as a classification problem, and a supervised learning method based on Gradient Boosting Tree is utilized to classify the types of paraphrase plagiarism. Inspired by the Meteor evaluation metrics of machine translation, the Meteor-like features are used for the classifier. Evaluation shows the performance of our approach, which achieved the highest Overall Score (0.77), the highest F1 measure for both Task1 and Task2 on Malayalam and Tamil, and the highest F1 measure on Punjabi Task2 in the 2016 FIRE Detecting Paraphrase in Indian Languages task. CCS Concepts • Information systems➝Information retrieval

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DPIL@FIRE2016: Overview of the Shared task on Detecting Paraphrases in Indian language

This paper explains the overview of the shared task "Detecting Paraphrases in Indian Languages" (DPIL) conducted at FIRE 2016. Given a pair of sentences in the same language, participants are asked to detect the semantic equivalence between the sentences. The shared task is proposed for four Indian languages namely Tamil, Malayalam, Hindi and Punjabi. The dataset created for the shared task has...

متن کامل

KS_JU@DPIL-FIRE2016: Detecting Paraphrases in Indian Languages Using Multinomial Logistic Regression Model

In this work, we describe a system that detects paraphrases in Indian Languages as part of our participation in the shared Task on detecting paraphrases in Indian Languages (DPIL) organized by Forum for Information Retrieval Evaluation (FIRE) in 2016. Our paraphrase detection method uses a multinomial logistic regression model trained with a variety of features which are basically lexical and s...

متن کامل

Anuj@DPIL-FIRE2016: A Novel Paraphrase Detection Method in Hindi Language using Machine Learning

Every language possesses plausible several interpretations. With the evolution of web, smart devices and social media it has become a challenging task to identify these syntactic or semantic ambiguities. In Natural Language Processing, two statements written using different words having same meaning is termed as paraphrasing. At FIRE 2016, we have worked upon the problem of detecting paraphrase...

متن کامل

CUSAT_TEAM@ DPIL-FIRE2016: Detecting Paraphrase in Indian Languages-Malayalam

This paper describes the work done as part of the shared task on Detecting Paraphrases in Indian Languages(DPIL) in Forum for Information Retrieval and Evaluation(FIRE 2016). Paraphrase identification is the task of deciding whether two given text fragments have the same meaning. Our detection system is for Malayalam language and makes use of the cosine similarity measure, an existing state of ...

متن کامل

JU_NLP@DPIL-FIRE2016: Paraphrase Detection in Indian Languages - A Machine Learning Approach

This paper presents our system report on our participation in the shared task on “Detecting Paraphrases in Indian Languages (DPIL)” organized in the “Forum for Information Retrieval Evaluation (FIRE)”2016, in both the tasks (Task1 and Task2) defined in this shared task in four Indian languages (Tamil, Malayalam, Hindi and Punjabi). We made use of different similarity measures and machine transl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

HIT2016@DPIL-FIRE2016: Detecting Paraphrases in Indian Languages based on Gradient Tree Boosting

نویسندگان

چکیده

منابع مشابه

DPIL@FIRE2016: Overview of the Shared task on Detecting Paraphrases in Indian language

KS_JU@DPIL-FIRE2016: Detecting Paraphrases in Indian Languages Using Multinomial Logistic Regression Model

Anuj@DPIL-FIRE2016: A Novel Paraphrase Detection Method in Hindi Language using Machine Learning

CUSAT_TEAM@ DPIL-FIRE2016: Detecting Paraphrase in Indian Languages-Malayalam

JU_NLP@DPIL-FIRE2016: Paraphrase Detection in Indian Languages - A Machine Learning Approach

عنوان ژورنال:

اشتراک گذاری